# Bayes Adaptive Monte Carlo Tree Search for Offline Model-based Reinforcement Learning


- Please download the dynamics and reward models used for [BA-MBRL](https://drive.google.com/drive/folders/1FlzAaJkOs7WKM73kJmpYkelFMVdShiYL?usp=sharing), [BA-MCTS](https://drive.google.com/drive/folders/14MxBjHX9eaiO5xCOjyUDSiqFMS0EKa2y?usp=sharing), [BA-MCTS-SL](https://drive.google.com/drive/folders/14MxBjHX9eaiO5xCOjyUDSiqFMS0EKa2y?usp=sharing), to the folder 'data'.

- The training time of different algorithms can be acquired from corresponding tensorboard files. The raw data for our Table 6 is recored in the file 'record_run_time.py'. You can directly run the following command to get the statistics:
    ```bash
    python record_run_time.py
    ```

- To replicate Table 9, please run:
    ```bash
    python test_belief.py --seed 0 --yaml_file args_yaml/ba_mcts/Z.yml --uuid E --load_model_dir F
    ```

    - Z is the name of the specific yaml file, which corresponds to a D4RL MuJoCo task, as listed in 'args_yaml/ba_mcts';

    - E specifies the name of the subfolder to store the training results;

    - F is the path to the dynamics and reward model (stored in the folder 'data'), which should correspond to the evaluation environment (specified by Z).

- To reproduce Table 10:

    - The raw data for Table 10 is stored in 'make_tables.py'. You can run the following command to reproduce the statistics:
        ```bash
        python make_tables.py
        ```
    
    - The raw data is computed based on imginary rollouts. Please download the rollout data from [traj_data](https://drive.google.com/drive/folders/1McVG1jPklsusxYi_zt927axHxJ5ndXy2?usp=sharing) to the folder 'traj_data'.

    - To get the raw data from imaginary rollouts. Please run the following command. You need to mannually change the argument of its main function by specifying the data file name. Data files that correspond to adaptive/uniform models have names like 'XXX_seed_Y.pkl'/'XXX_uniform_prior_ensemble_seed_Y.pkl', where 'XXX' is the environment name and 'Y' is the seed number.
        ```bash
        cd traj_data
        python test_traj_dst.py
        ```
    
    - The rollout data can be generated by running 'test_belief.py' as in replicating Table 9.

- To reproduce Figure 4:

    - Please run:
        ```bash
        cd traj_data
        python plot_belief.py
        ```
    
    - To draw Figures 4(b), 4(c), 4(d), you need to change the argument of the function main_2 in 'plot_belief.py' to t_id=0, t_id=22, t_id=495, respectively.

    

 

